if(!require("rworldmap")){
  install.packages("rworldmap")
}
## Loading required package: rworldmap
## Warning: package 'rworldmap' was built under R version 3.4.4
## Loading required package: sp
## ### Welcome to rworldmap ###
## For a short introduction type :   vignette('rworldmap')
if(!require("gridExtra")){
  install.packages("gridExtra")
}
## Loading required package: gridExtra
if(!require("scatterD3")){
  install.packages("scatterD3")
}
## Loading required package: scatterD3
## Warning: package 'scatterD3' was built under R version 3.4.4
if(!require("ggthemes")){
  install.packages("ggthemes")
}
## Loading required package: ggthemes
## Warning: package 'ggthemes' was built under R version 3.4.4
if(!require("openxlsx")){
  install.packages("openxlsx")
}
## Loading required package: openxlsx
library(openxlsx)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
## 
##     combine
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
library(ggthemes)
library(rworldmap)
library(tidyverse)
## -- Attaching packages ----------------------------------------------------------------------------- tidyverse 1.2.1 --
## v tibble  1.4.2     v purrr   0.2.4
## v tidyr   0.8.0     v stringr 1.2.0
## v readr   1.1.1     v forcats 0.2.0
## -- Conflicts -------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::combine() masks gridExtra::combine()
## x dplyr::filter()  masks stats::filter()
## x dplyr::lag()     masks stats::lag()
library(gridExtra)
library(scatterD3)

1 Introduction (including choice of data, questions), Team description

Terrorist attack is a major issue around the globe not only in the 21st century, but in the past decades as well. We want to study the difference between terrorist attacks pattern in the recent five years compared with those occured in the 1970s. We want to explore what are changed and why these changes happen. The data we are using is the Global Terrorism Database (GTD) from http://start.umd.edu/gtd/.

Team members’ contribution description: - Huijun Cui: Data quality analysis, Exploratory data analysis, Interactive - Xiangyu Liu: Description of data, Exploratory data analysis, Interactive, Conclusion - Xiaohan Yi: Introduction, Exploratory data analysis, Executive summary, Interactive

2 Description of Data

We get the dataset from the Global Terrorism Database (GTD) http://start.umd.edu/gtd/, which is an open-source database including information on terrorist events around the world from 1970 through 2016 (with annual updates planned for the future). Unlike many other event databases, the GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 170,000 cases. The dataset ends in 2016 because GTD collection beyond 2016 is ongoing and the website is updated annually. They expect to release the 2017 data in Summer 2018.

Statistical information contained in the Global Terrorism Database is based on reports from a variety of open media sources. Information is not added to the GTD unless and until we have determined the sources are credible. We should not infer any additional actions or results beyond what is presented in a GTD entry and specifically, we should not infer an individual associated with a particular incident was tried and convicted of terrorism or any other criminal offense. If new documentation about an event becomes available, an entry may be modified, as necessary and appropriate.

In our project, we divide the whole dataset into two subgroups: the data in 1970 to 1974 and 2012-2016. For the dataset from 1970-1974, there are 2670 observations of 135 variables. For the dataset from 2011-2016, there are 2670 observations of 135 variables. We would like to carry out data visualizatioin and data analysis on these subgroups individually and compare their performances. For example, we would like to know the difference in the total number of terrorist attacks by region and by year, which may be related to the policy issues, motivations and religions, etc.

3 Graphical Analysis of Data Quality

terrorism <- read.xlsx("../data/raw data/globalterrorismdb_0617dist.xlsx")
terrorism1<-terrorism[terrorism$iyear<=1974,]
#write.csv(terrorism1,"terrorism1.csv")
terrorism2<-terrorism[terrorism$iyear>=2012,]
#write.csv(terrorism2,"terrorism2.csv")

3.1 discover what’s in the data file that was exported

head(terrorism)
##       eventid iyear imonth iday approxdate extended resolution country
## 1 1.97000e+11  1970      7    2       <NA>        0         NA      58
## 2 1.97000e+11  1970      0    0       <NA>        0         NA     130
## 3 1.97001e+11  1970      1    0       <NA>        0         NA     160
## 4 1.97001e+11  1970      1    0       <NA>        0         NA      78
## 5 1.97001e+11  1970      1    0       <NA>        0         NA     101
## 6 1.97001e+11  1970      1    1       <NA>        0         NA     217
##          country_txt region                  region_txt provstate
## 1 Dominican Republic      2 Central America & Caribbean      <NA>
## 2             Mexico      1               North America      <NA>
## 3        Philippines      5              Southeast Asia    Tarlac
## 4             Greece      8              Western Europe    Attica
## 5              Japan      4                   East Asia      <NA>
## 6      United States      1               North America  Illinois
##            city latitude longitude specificity vicinity location
## 1 Santo Domingo 18.45679 -69.95116           1        0     <NA>
## 2   Mexico city 19.43261 -99.13321           1        0     <NA>
## 3       Unknown 15.47860 120.59974           4        0     <NA>
## 4        Athens 37.98377  23.72816           1        0     <NA>
## 5       Fukouka 33.58041 130.39636           1        0     <NA>
## 6         Cairo 37.00511 -89.17627           1        0     <NA>
##                                                                                                                                                                                                                                                                                                                                                    summary
## 1                                                                                                                                                                                                                                                                                                                                                     <NA>
## 2                                                                                                                                                                                                                                                                                                                                                     <NA>
## 3                                                                                                                                                                                                                                                                                                                                                     <NA>
## 4                                                                                                                                                                                                                                                                                                                                                     <NA>
## 5                                                                                                                                                                                                                                                                                                                                                     <NA>
## 6 1/1/1970: Unknown African American assailants fired several bullets at police headquarters in Cairo, Illinois, United States.  There were no casualties, however, one bullet narrowly missed several police officers.  This attack took place during heightened racial tensions, including a Black boycott of White-owned businesses, in Cairo Illinois.
##   crit1 crit2 crit3 doubtterr alternative alternative_txt multiple success
## 1     1     1     1         0          NA            <NA>        0       1
## 2     1     1     1         0          NA            <NA>        0       1
## 3     1     1     1         0          NA            <NA>        0       1
## 4     1     1     1         0          NA            <NA>        0       1
## 5     1     1     1        -9          NA            <NA>        0       1
## 6     1     1     1         0          NA            <NA>        0       1
##   suicide attacktype1                attacktype1_txt attacktype2
## 1       0           1                  Assassination          NA
## 2       0           6    Hostage Taking (Kidnapping)          NA
## 3       0           1                  Assassination          NA
## 4       0           3              Bombing/Explosion          NA
## 5       0           7 Facility/Infrastructure Attack          NA
## 6       0           2                  Armed Assault          NA
##   attacktype2_txt attacktype3 attacktype3_txt targtype1
## 1            <NA>          NA            <NA>        14
## 2            <NA>          NA            <NA>         7
## 3            <NA>          NA            <NA>        10
## 4            <NA>          NA            <NA>         7
## 5            <NA>          NA            <NA>         7
## 6            <NA>          NA            <NA>         3
##                 targtype1_txt targsubtype1
## 1 Private Citizens & Property           68
## 2     Government (Diplomatic)           45
## 3         Journalists & Media           54
## 4     Government (Diplomatic)           46
## 5     Government (Diplomatic)           46
## 6                      Police           22
##                                       targsubtype1_txt
## 1                                       Named Civilian
## 2 Diplomatic Personnel (outside of embassy, consulate)
## 3                      Radio Journalist/Staff/Facility
## 4                                    Embassy/Consulate
## 5                                    Embassy/Consulate
## 6      Police Building (headquarters, station, school)
##                         corp1                   target1 natlty1
## 1                        <NA>              Julio Guzman      58
## 2 Belgian Ambassador Daughter   Nadine Chaval, daughter      21
## 3            Voice of America                  Employee     217
## 4                        <NA>              U.S. Embassy     217
## 5                        <NA>            U.S. Consulate     217
## 6     Cairo Police Department Cairo Police Headquarters     217
##          natlty1_txt targtype2 targtype2_txt targsubtype2 targsubtype2_txt
## 1 Dominican Republic        NA          <NA>           NA             <NA>
## 2            Belgium        NA          <NA>           NA             <NA>
## 3      United States        NA          <NA>           NA             <NA>
## 4      United States        NA          <NA>           NA             <NA>
## 5      United States        NA          <NA>           NA             <NA>
## 6      United States        NA          <NA>           NA             <NA>
##   corp2 target2 natlty2 natlty2_txt targtype3 targtype3_txt targsubtype3
## 1  <NA>    <NA>      NA        <NA>        NA          <NA>           NA
## 2  <NA>    <NA>      NA        <NA>        NA          <NA>           NA
## 3  <NA>    <NA>      NA        <NA>        NA          <NA>           NA
## 4  <NA>    <NA>      NA        <NA>        NA          <NA>           NA
## 5  <NA>    <NA>      NA        <NA>        NA          <NA>           NA
## 6  <NA>    <NA>      NA        <NA>        NA          <NA>           NA
##   targsubtype3_txt corp3 target3 natlty3 natlty3_txt
## 1             <NA>  <NA>    <NA>      NA        <NA>
## 2             <NA>  <NA>    <NA>      NA        <NA>
## 3             <NA>  <NA>    <NA>      NA        <NA>
## 4             <NA>  <NA>    <NA>      NA        <NA>
## 5             <NA>  <NA>    <NA>      NA        <NA>
## 6             <NA>  <NA>    <NA>      NA        <NA>
##                                gname gsubname gname2 gsubname2 gname3
## 1                             MANO-D     <NA>   <NA>      <NA>   <NA>
## 2 23rd of September Communist League     <NA>   <NA>      <NA>   <NA>
## 3                            Unknown     <NA>   <NA>      <NA>   <NA>
## 4                            Unknown     <NA>   <NA>      <NA>   <NA>
## 5                            Unknown     <NA>   <NA>      <NA>   <NA>
## 6                 Black Nationalists     <NA>   <NA>      <NA>   <NA>
##   gsubname3                                         motive guncertain1
## 1      <NA>                                           <NA>           0
## 2      <NA>                                           <NA>           0
## 3      <NA>                                           <NA>           0
## 4      <NA>                                           <NA>           0
## 5      <NA>                                           <NA>           0
## 6      <NA> To protest the Cairo Illinois Police Deparment           0
##   guncertain2 guncertain3 individual nperps nperpcap claimed claimmode
## 1          NA          NA          0     NA       NA      NA        NA
## 2          NA          NA          0      7       NA      NA        NA
## 3          NA          NA          0     NA       NA      NA        NA
## 4          NA          NA          0     NA       NA      NA        NA
## 5          NA          NA          0     NA       NA      NA        NA
## 6          NA          NA          0    -99      -99       0        NA
##   claimmode_txt claim2 claimmode2 claimmode2_txt claim3 claimmode3
## 1          <NA>     NA         NA           <NA>     NA         NA
## 2          <NA>     NA         NA           <NA>     NA         NA
## 3          <NA>     NA         NA           <NA>     NA         NA
## 4          <NA>     NA         NA           <NA>     NA         NA
## 5          <NA>     NA         NA           <NA>     NA         NA
## 6          <NA>     NA         NA           <NA>     NA         NA
##   claimmode3_txt compclaim weaptype1             weaptype1_txt
## 1           <NA>        NA        13                   Unknown
## 2           <NA>        NA        13                   Unknown
## 3           <NA>        NA        13                   Unknown
## 4           <NA>        NA         6 Explosives/Bombs/Dynamite
## 5           <NA>        NA         8                Incendiary
## 6           <NA>        NA         5                  Firearms
##   weapsubtype1       weapsubtype1_txt weaptype2 weaptype2_txt weapsubtype2
## 1           NA                   <NA>        NA          <NA>           NA
## 2           NA                   <NA>        NA          <NA>           NA
## 3           NA                   <NA>        NA          <NA>           NA
## 4           16 Unknown Explosive Type        NA          <NA>           NA
## 5           NA                   <NA>        NA          <NA>           NA
## 6            5       Unknown Gun Type        NA          <NA>           NA
##   weapsubtype2_txt weaptype3 weaptype3_txt weapsubtype3 weapsubtype3_txt
## 1             <NA>        NA          <NA>           NA             <NA>
## 2             <NA>        NA          <NA>           NA             <NA>
## 3             <NA>        NA          <NA>           NA             <NA>
## 4             <NA>        NA          <NA>           NA             <NA>
## 5             <NA>        NA          <NA>           NA             <NA>
## 6             <NA>        NA          <NA>           NA             <NA>
##   weaptype4 weaptype4_txt weapsubtype4 weapsubtype4_txt
## 1        NA          <NA>           NA             <NA>
## 2        NA          <NA>           NA             <NA>
## 3        NA          <NA>           NA             <NA>
## 4        NA          <NA>           NA             <NA>
## 5        NA          <NA>           NA             <NA>
## 6        NA          <NA>           NA             <NA>
##                     weapdetail nkill nkillus nkillter nwound nwoundus
## 1                         <NA>     1      NA       NA      0       NA
## 2                         <NA>     0      NA       NA      0       NA
## 3                         <NA>     1      NA       NA      0       NA
## 4                    Explosive    NA      NA       NA     NA       NA
## 5                   Incendiary    NA      NA       NA     NA       NA
## 6 Several gunshots were fired.     0       0        0      0        0
##   nwoundte property propextent              propextent_txt propvalue
## 1       NA        0         NA                        <NA>        NA
## 2       NA        0         NA                        <NA>        NA
## 3       NA        0         NA                        <NA>        NA
## 4       NA        1         NA                        <NA>        NA
## 5       NA        1         NA                        <NA>        NA
## 6        0        1          3 Minor (likely < $1 million)        NA
##   propcomment ishostkid nhostkid nhostkidus nhours ndays divert
## 1        <NA>         0       NA         NA     NA    NA   <NA>
## 2        <NA>         1        1          0     NA    NA   <NA>
## 3        <NA>         0       NA         NA     NA    NA   <NA>
## 4        <NA>         0       NA         NA     NA    NA   <NA>
## 5        <NA>         0       NA         NA     NA    NA   <NA>
## 6        <NA>         0       NA         NA     NA    NA   <NA>
##   kidhijcountry ransom ransomamt ransomamtus ransompaid ransompaidus
## 1          <NA>      0        NA          NA         NA           NA
## 2        Mexico      1     8e+05          NA         NA           NA
## 3          <NA>      0        NA          NA         NA           NA
## 4          <NA>      0        NA          NA         NA           NA
## 5          <NA>      0        NA          NA         NA           NA
## 6          <NA>      0        NA          NA         NA           NA
##   ransomnote hostkidoutcome hostkidoutcome_txt nreleased
## 1       <NA>             NA               <NA>        NA
## 2       <NA>             NA               <NA>        NA
## 3       <NA>             NA               <NA>        NA
## 4       <NA>             NA               <NA>        NA
## 5       <NA>             NA               <NA>        NA
## 6       <NA>             NA               <NA>        NA
##                                                                           addnotes
## 1                                                                             <NA>
## 2                                                                             <NA>
## 3                                                                             <NA>
## 4                                                                             <NA>
## 5                                                                             <NA>
## 6 The Cairo Chief of Police, William Petersen, resigned as a result of the attack.
##                                                    scite1
## 1                                                    <NA>
## 2                                                    <NA>
## 3                                                    <NA>
## 4                                                    <NA>
## 5                                                    <NA>
## 6 "Police Chief Quits," Washington Post, January 2, 1970.
##                                                                                    scite2
## 1                                                                                    <NA>
## 2                                                                                    <NA>
## 3                                                                                    <NA>
## 4                                                                                    <NA>
## 5                                                                                    <NA>
## 6 "Cairo Police Chief Quits; Decries Local 'Militants'," Afro-American, January 10, 1970.
##                                                                                                                          scite3
## 1                                                                                                                          <NA>
## 2                                                                                                                          <NA>
## 3                                                                                                                          <NA>
## 4                                                                                                                          <NA>
## 5                                                                                                                          <NA>
## 6 Christopher Hewitt, "Political Violence and Terrorism in Modern America: A Chronology," Praeger Security International, 2005.
##         dbsource INT_LOG INT_IDEO INT_MISC INT_ANY related
## 1           PGIS       0        0        0       0    <NA>
## 2           PGIS       0        1        1       1    <NA>
## 3           PGIS      -9       -9        1       1    <NA>
## 4           PGIS      -9       -9        1       1    <NA>
## 5           PGIS      -9       -9        1       1    <NA>
## 6 Hewitt Project      -9       -9        0      -9    <NA>
nrow(terrorism)
## [1] 170350

The imported dataset contains information about terrorist attacks around the world in different fields.From the table above, we can search for the occurance time, occurance location, attack type, attack target and other significant information of terrorist incidents. All in all, it has over 170,000 records (number of terrorist incidents) with 135 kinds of variabes. The original dataset has a time range from 1970 to 2015, but in our project we make 2 subsets from the dataset – one period is from 1970 to 1974, another one is from 2012 to 2016. Because the whole dataset is too large and complex, we cannot get satisfactory results if we are lack of orientation. So eventually we decided to select these two periods as samples and comparison groups to conduct our research. After sebsetting, we have 68,366 records now.

combine <- subset(terrorism, iyear<=1974 | iyear>=2012)
nrow(combine)
## [1] 68366
terrorism_ <- combine[,c("iyear","imonth", "iday", "country_txt", "region_txt", "provstate", "city", "latitude", "longitude", "attacktype1_txt", "targtype1_txt")]

3.2 imputations for NULL values

library(extracat)
visna(combine, sort = "b")

visna(terrorism_, sort = "b")

The first picture is based on the subdataset that we selected from the two time period we mentioned above, which contains all kinds of variables in the original dataset. This graph discribes the condition of missing values. we can conclude that actually the datasets have lots of missing values, but they do not have a unbiased distribution, which means some categotries of variables are full of missing values, but conversely, another parts of categories do not have any missing values. In the second graph, we pick up some of the variables we like to use in our project, and make another missing value discription. We find that in the domains that we feel interested in, the dataset is satisfactory and complete, and we do not need to worry about the biase from missing values.

3.3 Outlier Analysis and Treatment

boxplot(imonth ~ iyear, data=terrorism1, main="Quantity of Terrorist Incidents in each Month across Years 1970-1974")  

boxplot(iday ~ imonth, data=terrorism1, main="Quantity of Terrorist Incidents in each Day across Months 2012-2016") 

boxplot(imonth ~ iyear, data=terrorism2, main="Quantity of Terrorist Incidents in each Month across Years 1970-1974")  

boxplot(iday ~ imonth, data=terrorism2, main="Quantity of Terrorist Incidents in each Day across Months 2012-2016") 

We conclude that we don’t have outliers according to the boxplots. In each month, terrorist attacks occured averagely, and in each year of the period we sampled, terrorist attacks also occured almost averagely.

4 Main Analysis (focus on quality of EDA choices / techniques)

4.1 A heatmap of total number of terrorist attacks worldwide

To compare the total number of terrorist attacks worldwide, we draw the following two heatmaps.

a <- rle(sort(terrorism1$country_txt))
b <- data.frame(country_txt=a$values, number_of_torrorism_attacks_from_1970_to_1974 = a$lengths)

gtdMap <- joinCountryData2Map( b, 
                               nameJoinColumn="country_txt", 
                               joinCode="NAME" )
## 67 codes from your data successfully matched countries in the map
## 9 codes from your data failed to match with a country code in the map
## 176 codes from the map weren't represented in your data
mapCountryData( gtdMap, 
                nameColumnToPlot='number_of_torrorism_attacks_from_1970_to_1974',
                catMethod='fixedWidth', 
                numCats=100 )

a <- rle(sort(terrorism2$country_txt))
b <- data.frame(country_txt=a$values, number_of_torrorism_attacks_from_2012_to_2016 = a$lengths)

gtdMap <- joinCountryData2Map( b, 
                               nameJoinColumn="country_txt", 
                               joinCode="NAME" )
## 137 codes from your data successfully matched countries in the map
## 1 codes from your data failed to match with a country code in the map
## 106 codes from the map weren't represented in your data
#mapDevice('x11')
mapCountryData( gtdMap, 
                nameColumnToPlot='number_of_torrorism_attacks_from_2012_to_2016', 
                catMethod='fixedWidth', 
                numCats=100 )

In order to visualize the total number of terrorist attacks throughout the years, we also draw a supplementary line chart to facilitate comparison.

#attack trends 1970-1974
year_1970_1974 <- terrorism1 %>% group_by(iyear) %>% summarise(n=n())
p1<-ggplot(aes(x = iyear, y = n), data = year_1970_1974) + geom_line() + xlab("Year") + ylab("terrorist Attacks Number") + ggtitle("Global Terrorist Attacks From 1970 To 1974") 

#attack trends 2012-2016
year_2012_2016 <- terrorism2 %>% group_by(iyear) %>% summarise(n=n())
p2<-ggplot(aes(x = iyear, y = n), data = year_2012_2016) + geom_line() + xlab("Year") + ylab("terrorist Attacks Number") + ggtitle("Global Terrorist Attacks From 2012 To 2016") 

grid.arrange(p1, p2, nrow = 2)

From the plot above, although the trends are different between these two periods, there has been a noticable increase on terrorist attacks number during the recent years compared with the 70s. During 2012-2016, the maximum number of attacks was around 17000, and the minimum numbre of attacks was around 8500. While during 1970-1974, the maximum number was only 650. This indicates that there have been more unstable factors around the globe in recent years and the world has been more dangerous now than in the 70s.

4.2 Terrorist Attack Tactics Wordwide, 1970-1974; 2012-2016

We then want to explore what kinds of attacks were occured during these years.

#1970-1974
attacktype_1970_1974 <- terrorism1 %>% group_by(attacktype1_txt) %>% summarise(n=n())
p1<-ggplot(aes(x=reorder(attacktype1_txt, n), y=n), data=attacktype_1970_1974) + 
geom_bar(stat = 'identity') + xlab('Attack Type') + ylab('Number of Attacks') + ggtitle('Global Terrorist Attack Types From 1970 To 1974') + coord_flip() 

#2012-2016
attacktype_2012_2016 <- terrorism2 %>% group_by(attacktype1_txt) %>% summarise(n=n())
p2<-ggplot(aes(x=reorder(attacktype1_txt, n), y=n), data=attacktype_2012_2016) + 
geom_bar(stat = 'identity') + xlab('Attack Type') + ylab('Number of Attacks') + ggtitle('Global Terrorist Attack Types From 2012 To 2016') + coord_flip()

grid.arrange(p1, p2, nrow = 2)

From the above bar chart, we can see that Bombing/Explosion has been the most frequently used method of terrorist attack both during 1970-1974 and 2012-2016. Armed Assault, Assassination, Hostage Taking (kidnapping) and Facility/Infrastructure have also been some of the most frequently used terrorist attack method for both period.

4.3 Terrorist Attack Targets in 1970 and 2016

There are a lot of different terrorist attack targets in our dataset, such as Government, Military, Education Institutions and so on. We would like to compare the top 10 terrorist attack targets in the first 1970 and the last year 2016 in our dataset. The variation in terrorist attack targets may be related to the policy issues, motivations and religions, etc.

attack1970 <- terrorism[terrorism$iyear==1970, ]
by_target <- attack1970 %>% group_by(targtype1_txt) %>% 
  summarise(n=n())
by_target <- arrange(by_target, desc(n))%>% head(10)

p1<-ggplot(aes(x=reorder(targtype1_txt, n), y=n), data=by_target) +
  geom_bar(stat = 'identity') + ggtitle('Attack Targets, 1970') +
  coord_flip() + theme_fivethirtyeight()

attack2016 <- terrorism[terrorism$iyear==2016, ]
by_target <- attack2016 %>% group_by(targtype1_txt) %>% 
  summarise(n=n())
by_target <- arrange(by_target, desc(n))%>% head(10)

p2<-ggplot(aes(x=reorder(targtype1_txt, n), y=n), data=by_target) +
  geom_bar(stat = 'identity') + ggtitle('Attack Targets, 2016') +
  coord_flip() + theme_fivethirtyeight()

grid.arrange(p1, p2, nrow = 1)

The diversity of terrorist attack targets is higher in 2016 than in 1970. The biggest difference is that private citizens and properties become the highest terrorist attack targets in 2016, which does not even appear among the top 10 terrorist attack targets in 1970. This may because the personal hatred is developing rapidly among these years.

For the top 10 terrorist attack targets, they require the most homeland security resources. Policymakers can use these and other results to focus their counter-terrorism measures.

4.4 Which cities were the most dangerous in 1970-1974 and 2012-2016

Then, we want to explore deeper to see the difference in the most dangerous cities in 1970-1974 and 2012-2016. They ranked among the world’s most dangerous places on the basis of total number of terrorism attacks.

attack_by_city <- terrorism1 %>% group_by(country_txt, city) %>% 
  summarise(n=n())
attack_by_city <- arrange(attack_by_city, desc(n))
top10_city<- head(attack_by_city, 10)

p1<-ggplot(data=top10_city, aes(x=reorder(city,-n), y=n, group=1)) +
  geom_line()+
  geom_point() + xlab("City") + ylab("Number of terrorist Attacks") +
        ggtitle("Top 10 dangerous city by Year 1970-1974") + theme_fivethirtyeight()

attack_by_city <- terrorism2 %>% group_by(country_txt, city) %>% 
  summarise(n=n())
attack_by_city <- arrange(attack_by_city, desc(n))
top10_city<- head(attack_by_city, 10)

p2<-ggplot(data=top10_city, aes(x=reorder(city,-n), y=n, group=1)) +
  geom_line()+
  geom_point() + xlab("City") + ylab("Number of terrorist Attacks") +
        ggtitle("Top 10 dangerous city by Year 2012-2016") + theme_fivethirtyeight()
grid.arrange(p1, p2, nrow = 2)

The most dangeours countries are very different in these two periods. During 1970-1974, the most dangerous countries are in United Kingdom, United States, Argentina, Uruguay and Turkey. Belfast was the most dangerous city in 1970-1974, with approximately 400 terrorist attacks in all.

During 2012-2016, the most dangerous countries are in Iraq, Pakistan, Somalia, Afghanistan, Libya and Yemen. Baghdad was the most dangerous city in 2012-2016, with approximately 4000 terrorist attacks in all. In addition, the total number of attacks increased rapidly, which may be caused by wars and other political issues.

4.5 Which regions were the most dangerous in 1970-1974 and 2012-2016

Since we have already explored terrorist attack methods and targets in previous analyzation, we also want to know which regions have been targetted the most.

#1970-1974
region_1970_1974 <- terrorism1 %>% group_by(region_txt) %>% summarise(n=n())
p1<-ggplot(aes(x=reorder(region_txt, n), y=n), data=region_1970_1974) + geom_bar(stat = 'identity') + xlab('Region') + ylab('Number of Terrorist Attacks From 1970 To 1974') + ggtitle('Terrorist Attacks Region From 1970 to 1974') + coord_flip()

#2012-2016
region_2012_2016 <- terrorism2 %>% group_by(region_txt) %>% summarise(n=n())
p2<-ggplot(aes(x=reorder(region_txt, n), y=n), data=region_2012_2016) + geom_bar(stat = 'identity') + xlab('Region') + ylab('Number of Terrorist Attacks From 2011 To 2016') + ggtitle('Terrorist Attacks Region From 2012 to 2016') + coord_flip()

grid.arrange(p1, p2, nrow = 2)

The region of terrorist attacks has also changed a lot in the past decades. During 1970-1974, the region with most terrorist attacks was Western Europe, followed by North America, South America, Middle East & North Afica etc,. While during 2012-2016, Middle East & North Afica has become the region with most number of terrorist attacks, followed by South Asia, Sub-Saharan African, Southeast Asia etc,. The change may cause by the unstable political or other situation in the Middle East and North Afica region and the dramatic change in the world pattern in the past 30 years.

4.6 Terrorist Attacks distribution by Region and Year 1970-1974 & 2012-2016

We also feel curious about the distribution worldwide by region in each year of both of the time ranges,‘1970-1974’and’2012-2016’. On the basis of this series of barcharts, we can know more about the questions like: ‘if the terrorist attack distribution of each year inside each time range is similar with each other?’ or ‘if there is any major differences between the 2 time ranges in terrorrist attacks distribution by region?’ To solve these problems, we built up the barcharts below.

#terrorism1970 <- read.csv('../data/terrorism1.csv', stringsAsFactors = F)
#terrorism2012 <- read.csv('../data/terrorism2.csv', stringsAsFactors = F)
combine <- subset(terrorism, iyear<=1974 | iyear>=2012)
by_year <- combine %>% group_by(iyear,region_txt) %>% 
  summarise(n=n())
ggplot(by_year, aes(x = region_txt, y = n, fill = n)) + 
  geom_bar(stat = 'identity') +
  facet_wrap(~iyear) + xlab('Year') + coord_flip() +
  ggtitle('Terrorist Attacks by Region and Year 1970-1974 & 2012-2016') + 
  theme(legend.position="none")

by_year <- terrorism1 %>% group_by(iyear,region_txt) %>% 
  summarise(n=n())
ggplot(by_year, aes(x = region_txt, y = n, fill = n)) + 
  geom_bar(stat = 'identity') +
  facet_wrap(~iyear) + xlab('Year') + coord_flip() +
  ggtitle('Terrorist Attacks by Region and Year 1970-1974') + 
  theme(legend.position="none")

by_year <- terrorism2 %>% group_by(iyear,region_txt) %>% 
  summarise(n=n())
ggplot(by_year, aes(x = region_txt, y = n, fill = n)) + 
  geom_bar(stat = 'identity') +
  facet_wrap(~iyear) + xlab('Year') + coord_flip() +
  ggtitle('Terrorist Attacks by Region and Year 2012-2016') + 
  theme(legend.position="none")

From the graphs above, we can conclude that compared with 40 years ago, the world is obviuosly more dangerous. In the era of 1970s, America and Western Europe are the most dangerous regions throughout the world, while in the ear of 2010s, Middle East, North Africa and South Asia already became the target regions of terrorrists, which means, the patterns have been subverted completely.

Simultaneously, we plotted pie charts to compare the contribution of each region for the occurance of terrorist attacks in the world.

NAmericadist <- terrorism1 %>% group_by(region_txt) %>% 
  summarise(n=n())
rgb.palette <- colorRampPalette(c(rgb(230,247,255,max=255),rgb(35,179,225,max=255)),space = "rgb")

bp<- ggplot(NAmericadist, aes(x="", y=n, fill=region_txt))+
geom_bar(width = 1, stat = "identity") + 
  scale_fill_manual(values=rgb.palette(11)) 
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),axis.title.y = element_blank()) +
ggtitle('Terrorist Attacks Distribution by Region and Year 1970-1974') 

ramp <- colorRamp(c("white", "red"))  
red <- rgb( ramp(seq(0, 1, length = 12)), max = 255)
MENAdist <- terrorism2 %>% group_by(region_txt) %>% 
  summarise(n=n())
bp2 <- ggplot(MENAdist, aes(x="", y=n, fill=region_txt))+
geom_bar(width = 1, stat = "identity") +
  scale_fill_manual(values=red) 
pie <- bp2 + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),
  axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Region and Year 2012-2016') 

The pie charts above show that compared with 40 years ago, the target regions of terrorist attacks became more diverse, and the areas involved are more extensive.

4.7 Comparison between Terrorist Attack numbers in Middle East & North Africa and North America by Year 1970-1974

On the ground of the analysis above, we want ot pay more attention to 2 major regions full of collisions during the past years, which are Middle East & North Africa and North America. Firstly we want to compare the numbers of attacks of the 2 regions in 1970-1974.

NA_MENA <- subset(terrorism1, region_txt == "North America" | region_txt == "Middle East & North Africa")
n_NAMENA <- NA_MENA %>% group_by(region_txt, iyear) %>% 
  summarise(n=n())
cbPalette2 <- c("#999999", "#E69F00")
ggplot(n_NAMENA, aes(x = iyear, y = n, fill = region_txt)) +
  geom_col() +
  xlab('Year') +
  facet_wrap(~ region_txt) +
  theme(legend.position = "none") +
  scale_fill_manual(values=cbPalette2) +
ggtitle("Comparison between Terrorist Attack numbers in Middle East & North Africa and North America by Year 1970-1974")

And this is the comparison during the period of 2012-2016.

NA_MENA <- subset(terrorism2, region_txt == "North America" | region_txt == "Middle East & North Africa")
n_NAMENA <- NA_MENA %>% group_by(region_txt, iyear) %>% 
  summarise(n=n())
cbPalette2 <- c("#999999", "#E69F00")
ggplot(n_NAMENA, aes(x = iyear, y = n, fill = region_txt)) +
  geom_col() +
  xlab('Year') +
  facet_wrap(~ region_txt) +
  theme(legend.position = "none") +
  scale_fill_manual(values=cbPalette2) +
ggtitle("Comparison between Terrorist Attack numbers in Middle East & North Africa and North America by Year 2012-2016")

The contrast is corrsponding to the conclusion we got in the last part.

4.8 Types of Terrorist Attacks in North America and Middle East & North Africa by Year 1970-1974

Now that we know the distribution of attack numbers among years, we want to know more details about these terrorist activities. So the barchart below will show us the contribution of different attack types in a certain year and a certain region.

cbPalette9 <- c("#000000", "#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
NorthAmerica <- terrorism1[terrorism1$region_txt=='North America', ]
NorthAmerica_type <- NorthAmerica %>% group_by(attacktype1_txt, iyear) %>% 
  summarise(n=n())
ggplot(aes(x = iyear, y = n, fill = attacktype1_txt), data = NorthAmerica_type) + geom_bar(stat = 'identity') +
scale_fill_manual(values=cbPalette9) +
ggtitle("Terrorist Attacks in North America by Year 1970-1974")

MENA <- terrorism1[terrorism1$region_txt=='Middle East & North Africa', ]
MENA_type <- MENA %>% group_by(attacktype1_txt, iyear) %>% 
  summarise(n=n())
ggplot(aes(x = iyear, y = n, fill = attacktype1_txt), data = MENA_type) + geom_bar(stat = 'identity') +
scale_fill_manual(values=cbPalette9) +  
ggtitle("Terrorist Attacks in Middle East & North Africa by Year 1970-1974")

For North America in 1970s, explosion occupied a large proportion of all of the attack types, and there are also lots of infrastructure attacks. On the other hand, for Middle East & North Africa in 1970s, explosion is also a major type of attacks, but other main methods of attacks are armed assault, kidnapping and Hijacking,

4.9 Types of Terrorist Attacks in North America and Middle East & North Africa by Year 2012-2016

NorthAmerica <- terrorism2[terrorism2$region_txt=='North America', ]
NorthAmerica_type <- NorthAmerica %>% group_by(attacktype1_txt, iyear) %>% 
  summarise(n=n())
ggplot(aes(x = iyear, y = n, fill = attacktype1_txt), data = NorthAmerica_type) + geom_bar(stat = 'identity') +
scale_fill_manual(values=cbPalette9) +  
ggtitle("Terrorist Attacks in North America by Year 1970-1974")

MENA <- terrorism2[terrorism2$region_txt=='Middle East & North Africa', ]
MENA_type <- MENA %>% group_by(attacktype1_txt, iyear) %>% 
  summarise(n=n())
ggplot(aes(x = iyear, y = n, fill = attacktype1_txt), data = MENA_type) + geom_bar(stat = 'identity') +
scale_fill_manual(values=cbPalette9) +  
ggtitle("Terrorist Attacks in Middle East & North Africa by Year 2012-2016")

In comparison with 40 years ago, for North America, the percentage of explosion decreased, but infrastructure attack became the most major attack type. However, for Middle East & North Africa, the distribution has not changed a lot, and the quantity of explosion is overwelmingly large.

4.10 Comparison of Terrorist Attacks Distribution by Country in North America& Middle East & North Africa and Year 1970-1974&2012-2016

At present, we combine the 2 regions together to explore the contribution of each country in the 2 regions.

#install.packages("viridis")
library(viridis)
## Warning: package 'viridis' was built under R version 3.4.4
## Loading required package: viridisLite
NA_MENA <- subset(terrorism1, region_txt == "North America" | region_txt == "Middle East & North Africa")
n_NAMENA <- NA_MENA %>% group_by(country_txt) %>% 
  summarise(n=n())
ramp <- colorRamp(c("purple",'mediumpurple','white'))  
color <- rgb( ramp(seq(0, 1, length = 17)), max = 255)
bp<-ggplot(n_NAMENA, aes(x="", y=n, fill=country_txt))+
geom_bar(width = 1, stat = "identity") +
  scale_fill_manual(values=color) 
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Country in North America& Middle East & North Africa and Year 1970-1974') 

NA_MENA <- subset(terrorism2, region_txt == "North America" | region_txt == "Middle East & North Africa")
n_NAMENA <- NA_MENA %>% group_by(country_txt) %>% 
  summarise(n=n())
ramp <- colorRamp(c("black",'grey','white'))  
color2 <- rgb( ramp(seq(0, 1, length = 22)), max = 255)
bp<- ggplot(n_NAMENA, aes(x="", y=n, fill=country_txt))+
geom_bar(width = 1, stat = "identity") +
  scale_fill_manual(values=color2) 
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),
  axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Country in North America& Middle East & North Africa and Year 2012-2016') 

It is self-evident that Iraq replaced the status of America to become the most dangerous country amoung North America& Middle East & North Africa. This probabily because of the influence of the Iraq war, and we can induce that most of risks are transferred to Middle East from United States.

4.11 Comparison of Terrorist Attacks Distribution by Country in North America and Year 1970-1974 & 2012-2016

On the basis of this, we want to concentrate on the 2 regions separately and analyse the relative safety of countries in some region.

NAmerica <- subset(terrorism1, region_txt == "North America")
n_NA <- NAmerica %>% group_by(country_txt) %>% 
  summarise(n=n())
cbPalette3 <- c("#999999", "#56B4E9", "#D55E00")
bp<- ggplot(n_NA, aes(x="", y=n, fill=country_txt))+
geom_bar(width = 1, stat = "identity")+
   scale_fill_manual(values=cbPalette3) 
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),
  axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Country in North America and Year 1970-1974') 

NAmerica <- subset(terrorism2, region_txt == "North America")
n_NA <- NAmerica %>% group_by(country_txt) %>% 
  summarise(n=n())
bp<- ggplot(n_NA, aes(x="", y=n, fill=country_txt))+
geom_bar(width = 1, stat = "identity")+
   scale_fill_manual(values=cbPalette3) 
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),
  axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Country in North America and Year 2012-2016') 

According to the pie charts, we found that although United States is still the most major attack target in North America, but the risks of Mexico and Canada increased evidently compared with 40 years ago. But all in all, Canada is still the safest country in North America.

4.12 Comparison of Terrorist Attacks Distribution by Country in Middle East & North Africa and Year 1970-1974 & 2012-2016

NMANA <- subset(terrorism1, region_txt == "Middle East & North Africa")
n_MANA <- NMANA %>% group_by(country_txt) %>% 
  summarise(n=n())
ramp <- colorRamp(c("#000000", "#999999", "#009E73", "#F0E442", "#0072B2"))  
color <- rgb( ramp(seq(0, 1, length = 17)), max = 255)
bp<- ggplot(n_MANA, aes(x="", y=n, fill=country_txt))+
geom_bar(width = 1, stat = "identity") +
  scale_fill_manual(values=color) 
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),
  axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Country in Middle East & North Africa and Year 1970-1974') 

NMANA <- subset(terrorism2, region_txt == "Middle East & North Africa")
n_MANA <- NMANA %>% group_by(country_txt) %>% 
  summarise(n=n())
color <- rgb( ramp(seq(0, 1, length = 19)), max = 255)
bp<- ggplot(n_MANA, aes(x="", y=n, fill=country_txt))+
geom_bar(width = 1, stat = "identity") +
  scale_fill_manual(values=color) 
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),
  axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Country in Middle East & North Africa and Year 2012-2016') 

For countries in Middle East & North Africa, the risks and contradictions in this region became more concentrated, which means, after several wars in this region during the past decades, great changes of security situations have taken place Middle East & North Africa.

5 Executive Summary (focus on quality of presentation choices / techniques)

This project provides an analysis and evaluation of global terrorist attack during two separate time periods, 1970-1974 and 2012-2016. The main goals are analyzing the terrorist attack patterns around the world and making comparison between the chosen time periods. We gathered our data through an open source database: Global Terrorism Database (GTD) http://start.umd.edu/gtd/, and used R and Shiny to do data analysis and visualization. Methods of data analysis include important feature extraction and outliers, missing value detections. Graphical analysis includes generating line charts, bar charts, pie charts and world map to facilitate analyzation.

The project draws attention to the fact that although the trends are different between these two periods, there has been a noticeable increase on terrorist attacks number during the recent five years compared with the 70s (Fig.1). The maximum number of terrorist attacks between 2012 and 2016 is approximately 17000, which is almost 27 times of the maximum number between 1970 to 1974.

fig1

fig1

We also notice that among these terrorist attacks, Bombing/Explosion has become the most common tacit during both time periods (Fig.2).

fig2

fig2

What catches our eyes is that in the 70s, the main target of attacks was Business, while in the 2010s, Private Citizens & Property has become the main objective (Fig.3). This reflects that in the 70s, terrorist attacks were mainly targeted on plundering wealth, especially from department stores, banks, hotels and supermarkets (Citation: https://qz.com/558597/charted-terror-attacks-in-western-europe-from-the-1970s-to-now/). While in 2010s, terrorist attacks become more and more plebification, with lots of innocent people being killed. If the terrorist attacks in the 70s more or less had a taste of robbing the rich to assist the poor, then the attacks in the 2010s have completely degenerated into the tools of game playing between different parties.

fig3

fig3

The region that have been attacked the most changed from Western Europe to Middle East & North Africa (Fig.4). Further investigation shows that terrorist attacks in Western Europe in the 70s were mainly extremist political groups organized bombings in Western European countries, which were more likely to be a local phenomenon (Citation: https://www.tandfonline.com/doi/full/10.1080/09546550902950308?src=recsys&). However, nowadays, the terrorist attacks in Western Europe are more or less related to Islamist extremists. The increasing number of terrorist attacks in the Middle East region in the past few years coincides with the unstable political situation nowadays in the Middle East region, especially after the Iraq War in 2003. Countries like Afghanistan and Pakistan are also the focus of terrorists these days.

fig4

fig4

Our analysis shows that there have been great changes in the world terrorist attacks pattern between 1970-1974 and 2012-2016. Although Western Europe and North America were two of the regions with most attacks in the 70s, compared with 40 years ago, these two areas become much safer due to political and societal stabilization. Middle East & North Africa has now become the most dangerous area around the world, with a main attack tacit of Bombing and Explosion.

6 Interactive Component

6.1 Part 1

This the link of our Shiny App: https://ellip123.shinyapps.io/tryyy/ In thi App, we plotted the relationships between different variables in the dataset, and tried to figure out lots of correlation relationships which we can not try on the traditional R plot.

6.2 Part 2

citation(package = "scatterD3")
## 
## To cite package 'scatterD3' in publications use:
## 
##   Julien Barnier, Kent Russell, Mike Bostock, Susie Lu and Speros
##   Kokenes (2018). scatterD3: D3 JavaScript Scatterplot from R. R
##   package version 0.8.2.
##   https://CRAN.R-project.org/package=scatterD3
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {scatterD3: D3 JavaScript Scatterplot from R},
##     author = {Julien Barnier and Kent Russell and Mike Bostock and Susie Lu and Speros Kokenes},
##     year = {2018},
##     note = {R package version 0.8.2},
##     url = {https://CRAN.R-project.org/package=scatterD3},
##   }

In this part we concentrate on the terrorist attack types in the two major regions: North America and Middle East & North Africa corresponding to different countries during different periods. Each value of attack types corresponds to a type of terrorest attack. They are:

attack type number 1: Assassination

attack type number 2: Armed Assault

attack type number 3: Bombing/Explosion

attack type number 4: Hijacking

attack type number 5: Hostage Taking (Barricade Incident)

attack type number 6: Hostage Taking (Kidnapping)

attack type number 7: Facility/Infrastructure Attack

attack type number 8: Unarmed Assault

attack type number 9: Unknown

We use scatter plots to show the corresponding relationships between countries and attack type. In the interactive graphs below, we used differents colors to represent different countries (and also the y axis labels). The typical symbol shapes represent typical attack types. Besides, we take use of the opacity, i.e. points with deeper color means higher value – the quantity of terrorist events in a certain country and of a certain type.

This graph can be zoomed in and zoomed out, and the knob at the top right corner can turn the lasso on, and we can select the points we want to pay attention to by the mouse. Furthermore, if you move your monse on the point, the point will become bigger and the data from that point will show up.

NAmerica <- subset(terrorism1, region_txt == "North America")
NAmerica$countryname <- paste(NAmerica$country, NAmerica$country_txt)
scatterD3(data = NAmerica, x = attacktype1, y = countryname
, point_size = 175, point_opacity = 0.1, symbol_var = attacktype1 ,hover_size = 6, hover_opacity = 1, col_var = countryname, lasso = TRUE)

From the graph above, we find the terrorist attack in the united states have lots of types, but Canada and Mexico have more Bombing/Explosion and Hostage Taking (Kidnapping) attacks in 1970s.

NAmerica <- subset(terrorism2, region_txt == "North America")
NAmerica$countryname <- paste(NAmerica$country, NAmerica$country_txt)
scatterD3(data = NAmerica, x = attacktype1, y = countryname
, point_size = 175, point_opacity = 0.1, symbol_var = attacktype1 ,hover_size = 6, hover_opacity = 1, col_var = countryname, lasso = TRUE)

However, after 40 years, things changed. Each country in North America has more Armed Assault, Bombing/Explosion and Hostage Taking (Kidnapping) attacks in 2010s.

MENA <- subset(terrorism1, region_txt == "Middle East & North Africa")
MENA$countryname <- paste(MENA$country, MENA$country_txt)
scatterD3(data = MENA, x = attacktype1, y = countryname
, point_size = 175, point_opacity = 0.1, symbol_var = attacktype1,hover_size = 6, hover_opacity = 1, col_var = countryname, lasso = TRUE)

Form the graph above, we find that in 1970s, most countries in Middle East & North Africa do not have terrorist attacks, and the most popular type of attack is Bombing/Explosion.

MENA <- subset(terrorism2, region_txt == "Middle East & North Africa")
MENA$countryname <- paste(MENA$country, MENA$country_txt)
scatterD3(data = MENA, x = attacktype1, y = countryname
, point_size = 175, point_opacity = 0.1, symbol_var = attacktype1,hover_size = 6, hover_opacity = 1, col_var = countryname, lasso = TRUE)

But 40 years later, the whole region became a dangerous region in the world. diverse types of attacks occured in this region, with representative ones like Arm Assault, exlosion, Hostage Taking (Kidnapping), Facility/Infrastructure Attack and etc.

In the future, we want to do more researches in the field of the international safety.

7 Conclusion

In our peoject, we use the global terrorism dataset to do the data analysis and data visualization. We carried out the graphical analysis of the data quality, including imputations of NULL values, outlier analysis and treatment, etc. We then focused on different main questions and did the analysis. For example, we compared the total number of terrorist attacks worldwide, the most dangerous places, terrorist attack tactics, terrorist attack targets by year. We also compared the types of terrorism attacks and terrorism attack distribution by region, etc. In addition, we used D3 to make an interacitive data visulization of the heatmap by year.